Transcriber Driving Strategies for Transcription Aid System

نویسندگان

  • Grégory Senay
  • Georges Linarès
  • Benjamin Lecouteux
  • Stanislas Oger
  • Thierry Michel
چکیده

Speech recognition technology suffers from a lack of robustness which limits its usability for fully automated speech-to-text transcription, and manual correction is generally required to obtain perfect transcripts. In this paper, we propose a general scheme for semi-automatic transcription, in which the system and the transcriptionist contribute jointly to the speech transcription. The proposed system relies on the editing of confusion networks and on reactive decoding, the latter one being supposed to take benefits from the manual correction and improve the error rates. In order to reduce the correction time, we evaluate various strategies aiming to guide the transcriptionist towards the critical areas of transcripts. These strategies are based on graph density-based criterion and two semantic consistency criterion; using a corpus-based method and a web-search engine. They allow to indicate to the user the areas which present severe lacks of understandability. We evaluate these driving strategies by simulating the correction process of French broadcast news transcriptions. Results show that interactive decoding improves the correction act efficiency with all driving strategies and semantic information must be integrated into the interactive decoding process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consistency in transcription and labelling of German intonation with GToBI

A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber-consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling.

متن کامل

Automatic estimation of transcription accuracy and difficulty

Managing a large-scale speech transcription task with a team of human transcribers requires effective quality control and workload distribution. As it becomes easier and cheaper to collect massive audio corpora the problem is magnified. Relying on expert review or transcribing all speech multiple times is impractical. Furthermore, speech that is difficult to transcribe may be better handled by ...

متن کامل

Evaluation Framework for Automatic Singing Transcription

In this paper, we analyse the evaluation strategies used in previous works on automatic singing transcription, and we present a novel, comprehensive and freely available evaluation framework for automatic singing transcription. This framework consists of a cross-annotated dataset and a set of extended evaluation measures, which are integrated in a Matlab toolbox. The presented evaluation measur...

متن کامل

Transcribing Speech: Errors in Corpora and Experimental Settings

Administrations, government organs, judiciary courts always faced the problem of defining limits in transcription practices. Nowadays corpus linguistics and computational linguistics have focused their attention on spoken corpora as indispensable tools for descriptive linguistics, as well as for applied purposes (in speech technologies, such as text-to-speech and speech recognition, in dialogue...

متن کامل

Decision-Based Transcription of Jazz Guitar Solos Using a Harmonic Bident Analysis Filter Bank and Spectral Distribution Weighting

Jazz guitar solos are improvised melody lines played on one instrument on top of a chordal accompaniment (comping). As the improvisation happens spontaneously, a reference score is non-existent, only a lead sheet. There are situations, however, when one would like to have the original melody lines in the form of notated music, see the Real Book. The motivation is either for the purpose of pract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010